{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Model-Free Reinforcement Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Remember how in last Notebook we felt like cheating by using directions calculated from the map of the environment?? Well, model-free reinforcement learning deals with that. Model-free refers to the fact that algorithms unders this category do not need a model of the environment, also known as MDP, to calculate optimal policies.\n",
    "\n",
    "In this notebook, we will look at what is perhaps the most popular model-free reinforcement learning algorithm, q-learning. Q-learning run without needing a map of the environment, it works by balancing the need for exploration with the need for exploiting previously explored knowledge. Let's take a look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import tempfile\n",
    "import pprint\n",
    "import math\n",
    "import json\n",
    "import sys\n",
    "import gym\n",
    "\n",
    "from gym import wrappers\n",
    "from subprocess import check_output\n",
    "from IPython.display import HTML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "#### Q-Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The function below, `action_selection` is an important aspect of reinforcement learning algorithms. The fact is, when you have possibly conflicting needs, explore vs exploit, you enter into a difficult situation, dilemma. The Exploration vs Exploitation Dilemma is at the core of reinforcement learning and it is good for you to think about it for a little while. How much do you need to explore an environment before you exploit it?\n",
    "\n",
    "In the function below we use one of the many alternatives which is we explore a lot at the begining and decay the amount of exploration as we increase the number of episodes. Let's take a look at what the function looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "def action_selection(state, Q, episode, n_episodes):\n",
    "    epsilon = max(0, episode/n_episodes*2)\n",
    "    if np.random.random() < epsilon:\n",
    "        action = np.random.randint(len(Q[0]))\n",
    "    else:\n",
    "        action = np.argmax(Q[state])\n",
    "    return action, epsilon"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.text.Text at 0x7fc7a2dcb908>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGzNJREFUeJzt3X+cVPV97/HXWyhKRQFlc6uCApH0SpNc0I2o8VpSSwVs\n1URvA8aHJpfWq61NbGzvA41porWWpLF6vVp/NLGJ3iramBBqsJRrNJZGqUul/iKEFX9ttHVt1ESr\nF5HP/WPOjuMwZ3aWnXPmx3k/H4957Mz3e2b2c/Ys+2a+3zPfo4jAzMwMYI9WF2BmZu3DoWBmZmUO\nBTMzK3MomJlZmUPBzMzKHApmZlbmUDAzszKHgpmZlTkUzMysbGyrCxipKVOmxPTp01tdhplZR9m4\nceNLEdEz3HYdFwrTp0+nr6+v1WWYmXUUSc80sp2Hj8zMrMyhYGZmZQ4FMzMrcyiYmVmZQ8HMzMoy\nCwVJN0l6UdJjKf2SdLWkfkmPSDo8q1rMzKwxWb5T+DqwsE7/ImBWcjsbuC7DWlhwxX3MvPC7LLji\nviy/jZlZR8ssFCLifuAndTY5Gbg5Sh4EJkk6IItaFlxxH1sHX2dnwNbB15m+/LtZfBszs47XyjmF\ng4DnKh4PJG27kHS2pD5JfYODgyP+Rk++9PoubQ4GM7NdtTIUVKMtam0YETdGRG9E9Pb0DPsp7V28\nd8reNdsdDGZm79bKUBgAplU8ngo8n8U3WnfB/NS+wy6+O4tvaWbWkVoZCquBM5OzkI4CXo2IF7L6\nZlMn7VWz/Y0dO7l1w7NZfVszs46S5SmptwEPAL8oaUDSMknnSDon2WQNsA3oB/4S+J2sagFYv/z4\n1L6Lvv1olt/azKxjKKLmMH7b6u3tjdGsklpvHuHpFSfu9uuambUzSRsjone47Qr3ieZ6f/hneOLZ\nzAqucKEAMGfqxJrtAZz5tQ35FmNm1kYKGQqrzjs2te/+rS/lWImZWXspZChA/WEkf37BzIqqsKEA\nDgYzs2qFDgWAyz/6gdS+Y1fck2MlZmatV/hQOH3ewYwfW/vHMPDKmzlXY2bWWoUPBYDNly1K7fMw\nkpkViUMh4fkFMzOHwrvUC4b3fW5NjpWYmbWGQ6FK2sJ5298OVqzZnHM1Zmb5cihUqbdw3vX3b8ux\nEjOz/DkUavD8gpkVlUMhhYPBzIrIoVDHcbOmpPZ54Twz60YOhTpuXjav5oWkwQvnmVl3cigM4ykP\nI5lZgTgUGuD5BTMrCodCg845bmZq34cuW5djJWZm2XEoNGj54sMYN6b2DMPga9vZ+MzLOVdkZtZ8\nDoUR+NGfLE7tO/W6H+RYiZlZNhwKI+T5BTPrZg6F3VAvGA69yMFgZp3LobCbZvXsXbN9x068cJ6Z\ndSyHwm5ad8H81D4vnGdmncqhMAqeXzCzbuNQGCUHg5l1E4dCE9RbOO+Ua9bnWImZ2eg4FJrg5mXz\nUvs2DbyaYyVmZqPjUGgSDyOZWTfINBQkLZS0RVK/pOU1+g+WdK+khyU9Iin9I8MdwMFgZp0us1CQ\nNAa4FlgEzAaWSppdtdnFwB0RMRdYAvxFVvXk5fKPfiC179gV9+RYiZnZyGX5TuFIoD8itkXEdmAl\ncHLVNgHsm9yfCDyfYT25OH3ewUwYN6Zm38Arb+ZcjZnZyGQZCgcBz1U8HkjaKn0ROEPSALAG+L0M\n68nNY5cuTO3zMJKZtbMsQ6HWOtNR9Xgp8PWImAosBm6RtEtNks6W1Cepb3BwMINSm8/zC2bWibIM\nhQFgWsXjqew6PLQMuAMgIh4A9gJ2Oek/Im6MiN6I6O3p6cmo3Obzwnlm1mmyDIWHgFmSZkgaR2ki\neXXVNs8CxwNIOoxSKHTGW4EG1Vs47/yVD+dcjZlZfZmFQkTsAM4D1gKbKZ1l9LikSyWdlGx2AfDb\nkv4FuA34ZERUDzF1tHoL563a1PHz6mbWZdRpf4N7e3ujr6+v1WWMWL15hHrDTGZmzSBpY0T0Dred\nP9GcE088m1kncCjk6JQ5B6b3eeE8M2sDDoUcXbVkbuoP3AvnmVk7cCjkbJuHkcysjTkUWsDzC2bW\nrhwKLVJv4bwPXbYux0rMzN7hUGiR0+cdzPixtX/8g69tZ+MzL+dckZmZQ6GlNl+2KLXv1Ot+kGMl\nZmYlDoUW8/yCmbUTh0Ib8MJ5ZtYuHAptYs7UiTXbd+yEFWs251yNmRWVQ6FNrDrv2NS+6+/flmMl\nZlZkDoU24vkFM2s1h0KbcTCYWSs5FNpQvYXzFlxxX36FmFnhOBTaUL2F87YOvp5rLWZWLA6FNuWF\n88ysFRwKbczzC2aWN4dCm7vz3GNS++ZcsjbHSsysCBwKbe6IQyYzafzYmn2vvLHDC+eZWVM5FDrA\npi+ckNrnhfPMrJkcCh3C8wtmlgeHQgepFwzvvdDBYGaj51DoMLN69q7Z/nbA+SsfzrkaM+s2DoUO\ns+6C+al9qzY9n18hZtaVHAodyPMLZpYVh0KHcjCYWRYcCh3snONmpvadcs36HCsxs27hUOhgyxcf\nxtiUI7hp4NV8izGzruBQ6HD9l3sYycyap6FQkHSnpBMlOUTakOcXzKxZGv0jfx1wOrBV0gpJ/7mR\nJ0laKGmLpH5Jy1O2+U1JT0h6XNKtDdZjVeotnPf+P/q7HCsxs07WUChExP+NiE8AhwNPA+sk/UDS\npyT9XK3nSBoDXAssAmYDSyXNrtpmFnAh8OGI+CXg/N3ek4Krt3Dea9vf5tYNz+ZckZl1ooaHgyTt\nD3wS+C3gYeB/UQqJdSlPORLoj4htEbEdWAmcXLXNbwPXRsTLABHx4oiqt3ept3DeRd9+NMdKzKxT\nNTqn8C3gH4CfB34jIk6KiNsj4veACSlPOwh4ruLxQNJW6X3A+yT9o6QHJS1M+f5nS+qT1Dc4ONhI\nyYXl+QUzG41G3yl8NSJmR8SfRsQLAJL2BIiI3pTnqEZbVD0eC8wC5gNLga9KmrTLkyJujIjeiOjt\n6elpsOTiqhcMMx0MZlZHo6FwWY22B4Z5zgAwreLxVKB6cZ4B4DsR8VZEPAVsoRQSNkpzpk6s2b4T\nL5xnZunqhoKkX5B0BDBe0lxJhye3+ZSGkup5CJglaYakccASYHXVNquAjyTfawql4aRtu7EfVmXV\necem93nhPDNLMdw7hROAr1D6X/6fA1ckt88CF9V7YkTsAM4D1gKbgTsi4nFJl0o6KdlsLfDvkp4A\n7gX+MCL+fXd3xt7N8wtmNlKKqB7mr7GRdGpE3JlDPcPq7e2Nvr6+VpfRUeoFQL3gMLPuIWljnTng\nsuGGj85I7k6X9NnqW1MqtcydMufA9D4vnGdmFYYbPhq6zNcEYJ8aN+sAVy2Zy5ha54LhhfPM7N0a\nGj5qJx4+2n0eRjIrrkaHj+qGgqSr6z05Ij69G7WNikNhdBwMZsXUlDkFYOMwN+sw9f7we+E8M6u9\ngloiIr6RVyGWn6mT9mLglTd3aR9aOO/0eQe3oCozawfDnX10VfL1byWtrr7lU6I12/rlx6f2eeE8\ns2Kr+04BuCX5+pWsC7F8Pb3ixNT5henLv+v5BbOCqvtOISI2Jl+/T2mto5eBnwAPJG3WwbxwnplV\na3Tp7BOBJ4GrgWuAfkmLsizM8uGF88ysUqOrpF4BfCQi5kfEL1NaxO7K7MqyvHjhPDOr1GgovBgR\n/RWPtwG+SlqX8MJ5ZjZkuLOPPibpY8DjktZI+qSks4C/pbQ0tnUJB4OZwfDvFH4jue0F/Bvwy5Su\nkjYITM60MsvdOcfNTO1bcMV9+RViZi0z3IfXPpVXIdZ6yxcfxk3/+BTb39516ZOtg6+3oCIzy1uj\nZx/tJel3Jf2FpJuGblkXZ/n70Z8sTu3zMJJZ92t0ovkW4BcoXYnt+5SuxPazrIqy1vL8gllxNRoK\nh0bE54HXk/WQTgQ+kF1Z1mp3nntMat9hF9+dYyVmlqdGQ+Gt5Osrkt4PTASmZ1KRtYUjDplMz4Rx\nNfve2LGTFWs251yRmeWh0VC4UdJk4PPAauAJ4EuZVWVt4aGLF6T2XX//thwrMbO8NBQKEfHViHg5\nIr4fETMj4j0RcUPWxVnreX7BrFgaPftof0n/W9I/S9oo6SpJ+2ddnLWHesEww8Fg1lUaHT5aSWlZ\ni1OB04CXgNuzKsraz3GzptRsD+DMr23Itxgzy0yjobBfRPxxRDyV3C4DJmVZmLWXm5fNS+27f+tL\nOVZiZllqNBTulbRE0h7J7TcBjxsUjOcXzLrfcAvi/UzST4H/AdwKbE9uK4Hfz748azcOBrPuNtyV\n1/aJiH2Tr3tExNjktkdE7JtXkdZevHCeWfdqdPgISSdJ+kpy+/Usi7L2tnzxYYxN+c3xwnlmna3R\nU1JXAJ+h9KG1J4DPJG1WUP2XexjJrBs1+k5hMbAgIm6KiJuAhUmbFZjnF8y6T8PDR7z7FNTaV3uv\nImmhpC2S+iUtr7PdaZJCUu8I6rE2UC8YvHCeWedpNBT+FHhY0tclfQPYCFxe7wmSxgDXAouA2cBS\nSbNrbLcP8GnAn4DqULN69q7Z/saOndy64dmcqzGz0Rg2FCQJWA8cBXwruR0dESuHeeqRQH9EbIuI\nodNYT66x3R8DXwbeHEnh1j7WXTA/te+ibz+aXyFmNmrDhkJEBLAqIl6IiNUR8Z2I+NcGXvsg4LmK\nxwNJW5mkucC0iLhrJEVb+/H8gll3aHT46EFJHxrha6tGW/niv5L2AK4ELhj2haSzJfVJ6hscHBxh\nGZYXL5xn1vkaDYWPUAqGJyU9IulRSY8M85wBYFrF46nA8xWP9wHeD9wn6WlKw1Ora002R8SNEdEb\nEb09PT0Nlmyt4IXzzDrb2Aa3W7Qbr/0QMEvSDODHwBLg9KHOiHgVKP8FkXQf8AcR0bcb38vaxM3L\n5qUOF3nhPLP2N9zaR3tJOh/4Q0qfTfhxRDwzdKv33IjYAZwHrAU2A3dExOOSLpV0UpPqtzbk+QWz\nzqXSPHJKp3Q7pesz/wOldwvPRMRncqqtpt7e3ujr85uJTlAvAOoFh5k1n6SNETHsZ8GGm1OYHRFn\nJJfePA34r02pzgrh8o9+ILXv2BX35FiJmTVquFB4a+hOMhxk1rDT5x3M+JSV8wZe8cdSzNrRcKHw\nXyT9NLn9DPjg0P3kOgtmdW2+LP0cBc8vmLWf4a6nMCa5nsLQNRXGVtz39RSsIZ54NuscI1kQz2y3\neeE8s87gULDcTJ20V812L5xn1j4cCpab9cuPT+3zwnlm7cGhYLny/IJZe3MoWO4cDGbty6FgLXHK\nnANT+7xwnlnrOBSsJa5aMjf1l88L55m1jkPBWmabh5HM2o5DwVrK8wtm7cWhYC1Xb+G8D122LsdK\nzMyhYC1Xb+G8wde2s/GZl3OuyKy4HArWFuotnHfqdT/IsRKzYnMoWNvw/IJZ6zkUrK3UC4ZDL3Iw\nmGXNoWBtZ1bP3jXbd+yEFWs251yNWbE4FKztrLtgfmrf9fdvy68QswJyKFhb8vyCWWs4FKxtORjM\n8udQsLZ23KwpqX1eOM+s+RwK1tZuXjYPpfR54Tyz5nMoWNt7ysNIZrlxKFhH8PyCWT4cCtYx7jz3\nmNQ+L5xn1hwOBesYRxwymUnjx9bs88J5Zs3hULCOsukLJ6T2eeE8s9FzKFjH8fyCWXYcCtaRvHCe\nWTYyDQVJCyVtkdQvaXmN/s9KekLSI5LukXRIlvVYd/HCeWbNl1koSBoDXAssAmYDSyXNrtrsYaA3\nIj4IfBP4clb1WPfxwnlmzZflO4Ujgf6I2BYR24GVwMmVG0TEvRHxH8nDB4GpGdZjXcjzC2bNlWUo\nHAQ8V/F4IGlLswy4u1aHpLMl9UnqGxwcbGKJ1g0cDGbNk2Uo1FqyJmpuKJ0B9AJ/Vqs/Im6MiN6I\n6O3p6WliidYtTplzYHrfNetzrMSss2UZCgPAtIrHU4HnqzeS9KvA54CTIuL/ZViPdbGrlsxN/WXe\nNPBqrrWYdbIsQ+EhYJakGZLGAUuA1ZUbSJoL3EApEF7MsBYrgG0eRjIbtcxCISJ2AOcBa4HNwB0R\n8bikSyWdlGz2Z8AE4G8kbZK0OuXlzBri+QWz0VFEzWH+ttXb2xt9fX2tLsPa2K0bnuWibz9as69n\nwjgeunhBzhWZtZ6kjRHRO9x2/kSzdZ3T5x3MhHFjavZ54Tyz+hwK1pUeu3Rhap8XzjNL51CwruX5\nBbORcyhYV6sXDO+90MFgVs2hYF1vztSJNdvfDjh/5cM5V2PW3hwK1vVWnXdset+mXT5PaVZoDgUr\nBM8vmDXGoWCF4WAwG55DwQrFC+eZ1edQsELxwnlm9TkUrHC8cJ5ZOoeCFZLnF8xqcyhYYd157jGp\nfXMuWZtjJWbtw6FghXXEIZOZNH5szb5X3tjhhfOskBwKVmibvnBCap8XzrMicihY4Xl+wewdDgUz\nvHCe2RCHglnCC+eZORTMyrxwnplDwexdPL9gRedQMKviYLAicyiY1XDOcTNT+xZccV9+hZjlzKFg\nVsPyxYcxNuVfx9bB1/MtxixHDgWzFP2XexjJisehYFaH5xesaBwKZsPwwnlWJA4Fs2F44TwrEoeC\nWQO8cJ4VhUPBrEGeX7AicCiYjUC9YJjpYLAukGkoSFooaYukfknLa/TvKen2pH+DpOlZ1mPWDGkL\n5+3EC+dZ58ssFCSNAa4FFgGzgaWSZldttgx4OSIOBa4EvpRVPWbN4oXzrJvVPqWiOY4E+iNiG4Ck\nlcDJwBMV25wMfDG5/03gGkmKiMiwLrNRe3rFianzCJ5fsCzNmTqx7n9MRivL4aODgOcqHg8kbTW3\niYgdwKvA/hnWZNY09eYXzLKyaeBVTrlmfWavn2UoqEZb9TuARrZB0tmS+iT1DQ4ONqU4s2aot3Ce\nWVYee/6nmb12lqEwAEyreDwVqB5wLW8jaSwwEfhJ9QtFxI0R0RsRvT09PRmVazZy9RbOM8vK+w/c\nN7PXzvLX+SFglqQZksYBS4DVVdusBs5K7p8GfM/zCdZp+i8/kQnjxrS6DCuIrOcUMptojogdks4D\n1gJjgJsi4nFJlwJ9EbEa+Bpwi6R+Su8QlmRVj1mWHrt0YatLMGuKLM8+IiLWAGuq2v6o4v6bwH/L\nsgYzM2ucR0PNzKzMoWBmZmUOBTMzK3MomJlZmUPBzMzK1GkfC5A0CDyzm0+fArzUxHI6gfe5GLzP\nxTCafT4kIob99G/HhcJoSOqLiN5W15En73MxeJ+LIY999vCRmZmVORTMzKysaKFwY6sLaAHvczF4\nn4sh830u1JyCmZnVV7R3CmZmVkdhQkHSQklbJPVLWt7qenaXpGmS7pW0WdLjkj6TtO8naZ2krcnX\nyUm7JF2d7Pcjkg6veK2zku23Sjor7Xu2C0ljJD0s6a7k8QxJG5L6b0+WaEfSnsnj/qR/esVrXJi0\nb5F0Qmv2pDGSJkn6pqQfJsf76G4/zpJ+P/m9fkzSbZL26rbjLOkmSS9KeqyirWnHVdIRkh5NnnO1\npFoXM0sXEV1/o7R095PATGAc8C/A7FbXtZv7cgBweHJ/H+BHwGzgy8DypH058KXk/mLgbkpXuTsK\n2JC07wdsS75OTu5PbvX+DbPvnwVuBe5KHt8BLEnuXw+cm9z/HeD65P4S4Pbk/uzk2O8JzEh+J8a0\ner/q7O83gN9K7o8DJnXzcaZ0ed6ngPEVx/eT3XacgeOAw4HHKtqadlyBfwKOTp5zN7BoRPW1+geU\n00E4Glhb8fhC4MJW19WkffsOsADYAhyQtB0AbEnu3wAsrdh+S9K/FLihov1d27XbjdKV++4BfgW4\nK/mFfwkYW32MKV3D4+jk/thkO1Uf98rt2u0G7Jv8gVRVe9ceZ965Zvt+yXG7CzihG48zML0qFJpy\nXJO+H1a0v2u7Rm5FGT4a+mUbMpC0dbTk7fJcYAPwnyLiBYDk63uSzdL2vdN+JlcB/xPYmTzeH3gl\nInYkjyvrL+9b0v9qsn0n7fNMYBD4q2TI7KuS9qaLj3NE/Bj4CvAs8AKl47aR7j7OQ5p1XA9K7le3\nN6wooVBrTK2jT7uSNAG4Ezg/IupdxTtt3zvmZyLp14EXI2JjZXONTWOYvo7ZZ0r/8z0cuC4i5gKv\nUxpWSNPx+5yMo59MacjnQGBvYFGNTbvpOA9npPs46n0vSigMANMqHk8Fnm9RLaMm6ecoBcJfR8S3\nkuZ/k3RA0n8A8GLSnrbvnfQz+TBwkqSngZWUhpCuAiZJGrp6YGX95X1L+idSutxrJ+3zADAQERuS\nx9+kFBLdfJx/FXgqIgYj4i3gW8AxdPdxHtKs4zqQ3K9ub1hRQuEhYFZyFsM4SpNSq1tc025JziT4\nGrA5Iv68oms1MHQGwlmU5hqG2s9MzmI4Cng1eXu6Fvg1SZOT/6H9WtLWdiLiwoiYGhHTKR2770XE\nJ4B7gdOSzar3eehncVqyfSTtS5KzVmYAsyhNyrWdiPhX4DlJv5g0HQ88QRcfZ0rDRkdJ+vnk93xo\nn7v2OFdoynFN+n4m6ajkZ3hmxWs1ptUTLjlO7CymdKbOk8DnWl3PKPbjWEpvBx8BNiW3xZTGUu8B\ntiZf90u2F3Btst+PAr0Vr/Xfgf7k9qlW71uD+z+fd84+mknpH3s/8DfAnkn7Xsnj/qR/ZsXzP5f8\nLLYwwrMyWrCvc4C+5FivonSWSVcfZ+AS4IfAY8AtlM4g6qrjDNxGac7kLUr/s1/WzOMK9CY/vyeB\na6g6WWG4mz/RbGZmZUUZPjIzswY4FMzMrMyhYGZmZQ4FMzMrcyiYmVmZQ8EKT9LbkjZV3Oquoivp\nHElnNuH7Pi1pymhfx6yZfEqqFZ6k1yJiQgu+79OUzjt/Ke/vbZbG7xTMUiT/k/+SpH9Kbocm7V+U\n9AfJ/U9LeiJZ635l0rafpFVJ24OSPpi07y/p75MF7m6gYp0aSWck32OTpBskjWnBLps5FMyA8VXD\nRx+v6PtpRBxJ6ZOhV9V47nJgbkR8EDgnabsEeDhpuwi4OWn/ArA+SgvcrQYOBpB0GPBx4MMRMQd4\nG/hEc3fRrDFjh9/ErOu9kfwxruW2iq9X1uh/BPhrSasoLUUBpaVITgWIiO8l7xAmUrq4yseS9u9K\nejnZ/njgCOCh5CJZ43lnQTSzXDkUzOqLlPtDTqT0x/4k4POSfon6yxfXeg0B34iIC0dTqFkzePjI\nrL6PV3x9oLJD0h7AtIi4l9IFgCYBE4D7SYZ/JM0HXorSNS8q2xdRWuAOSgugnSbpPUnffpIOyXCf\nzFL5nYJZMqdQ8fjvImLotNQ9JW2g9B+opVXPGwP8n2RoSMCVEfGKpC9SumLaI8B/8M6SyJcAt0n6\nZ+D7lJaKJiKekHQx8PdJ0LwF/C7wTLN31Gw4PiXVLIVPGbUi8vCRmZmV+Z2CmZmV+Z2CmZmVORTM\nzKzMoWBmZmUOBTMzK3MomJlZmUPBzMzK/j/JwetfrmisdgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7fc7a13e7cc0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "Q = [[0]]\n",
    "n_episodes = 10000\n",
    "epsilons = []\n",
    "for episode in range(n_episodes//2, -n_episodes//2, -1):\n",
    "    _, epsilon = action_selection(0, Q, episode, n_episodes)\n",
    "    epsilons.append(epsilon)\n",
    "plt.plot(np.arange(len(epsilons)), epsilons, '.')\n",
    "plt.ylabel('Probability')\n",
    "plt.xlabel('Episode')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "See that? So, at episode 0 we have 100% change of acting randomly, all the way down to 0 when we stop exploring and instead always select the action that we think would maximizing the discounted future rewards. \n",
    "\n",
    "Again, this is a way of doing this, there are many and you surely should be thinking about better ways of doing so.\n",
    "\n",
    "Next, let me show you what Q-Learning looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "def q_learning(env, alpha = 0.9, gamma = 0.9):\n",
    "    nS = env.env.observation_space.n\n",
    "    nA = env.env.action_space.n\n",
    "    \n",
    "    Q = np.random.random((nS, nA)) * 2.0\n",
    "    n_episodes = 10000\n",
    "    \n",
    "    for episode in range(n_episodes//2, -n_episodes//2, -1):\n",
    "        state = env.reset()\n",
    "        done = False\n",
    "        while not done:\n",
    "            action, _ = action_selection(state, Q, episode, n_episodes)\n",
    "            nstate, reward, done, info = env.step(action)\n",
    "            Q[state][action] += alpha * (reward + gamma * Q[nstate].max() * (not done) - Q[state][action])\n",
    "            state = nstate\n",
    "    return Q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Nice, right? You just pass it an environment, nS and nA are the number of states and actions respectively. \n",
    "\n",
    "Q is a table of states as rows and actions as columns that will hold the expected reward the agent expects to get for taking action 'a' on state 's'. You can see how we initialize Q(s,a)'s to a random value, but also we multiply that by 2. You may ask, why is this? This is called \"Optimism in the face of uncertainty\" and it is a common reinforcement learning technique for encouraging agents to explore. Think about it on an intuitive level. If you think positively most of the time, if you receive a low balling job offer, you are going to pass on it and potentially get a better offer later. Worst case, you don't find any better offer and after 'adjusting' your estimates you will think an offer like the \"low balling\" one you got wasn't that bad after all. The same applies to reinforcement learning agent, cool right?\n",
    "\n",
    "Then, I go on a loop for `n_episodes` using the `action_selection` function as described above. Don't pay too much attention to the range start and end, that is just the way I get the exploration strategy the way I showed. You should not like it, I don't like it. You will have a chance to make it better.\n",
    "\n",
    "For now, let's unleash this agent and see how it does!!!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2017-04-26 14:54:11,087] Making new env: FrozenLake-v0\n",
      "[2017-04-26 14:54:11,092] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000000.json\n",
      "[2017-04-26 14:54:11,094] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000001.json\n",
      "[2017-04-26 14:54:11,098] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000008.json\n",
      "[2017-04-26 14:54:11,104] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000027.json\n",
      "[2017-04-26 14:54:11,116] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000064.json\n",
      "[2017-04-26 14:54:11,130] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000125.json\n",
      "[2017-04-26 14:54:11,154] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000216.json\n",
      "[2017-04-26 14:54:11,183] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000343.json\n",
      "[2017-04-26 14:54:11,219] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000512.json\n",
      "[2017-04-26 14:54:11,263] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video000729.json\n",
      "[2017-04-26 14:54:11,328] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video001000.json\n",
      "[2017-04-26 14:54:11,586] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video002000.json\n",
      "[2017-04-26 14:54:11,913] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video003000.json\n",
      "[2017-04-26 14:54:12,217] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video004000.json\n",
      "[2017-04-26 14:54:12,717] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video005000.json\n",
      "[2017-04-26 14:54:13,719] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video006000.json\n",
      "[2017-04-26 14:54:14,796] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video007000.json\n",
      "[2017-04-26 14:54:15,840] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video008000.json\n",
      "[2017-04-26 14:54:16,913] Starting new video recorder writing to /tmp/tmpe11j2aj2/openaigym.video.0.93.video009000.json\n"
     ]
    }
   ],
   "source": [
    "mdir = tempfile.mkdtemp()\n",
    "env = gym.make('FrozenLake-v0')\n",
    "env = wrappers.Monitor(env, mdir, force=True)\n",
    "\n",
    "Q = q_learning(env)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Let's look at a couple of the episodes in more detail."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "videos = np.array(env.videos)\n",
    "n_videos = 5\n",
    "\n",
    "idxs = np.linspace(0, len(videos) - 1, n_videos).astype(int)\n",
    "videos = videos[idxs,:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "urls = []\n",
    "for i in range(n_videos):\n",
    "    out = check_output([\"asciinema\", \"upload\", videos[i][0]])\n",
    "    out = out.decode(\"utf-8\").replace('\\n', '').replace('\\r', '')\n",
    "    urls.append([out])\n",
    "videos = np.concatenate((videos, urls), axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <h2>Episode 0\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/3dv9izu19pspkw388dprzyu6o.js\" \n",
       "        id=\"asciicast-3dv9izu19pspkw388dprzyu6o\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 64\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/7ey2lcpco213cfwialrpuwx8u.js\" \n",
       "        id=\"asciicast-7ey2lcpco213cfwialrpuwx8u\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 729\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/5j0pv593cx2sfoiirwufn9n9i.js\" \n",
       "        id=\"asciicast-5j0pv593cx2sfoiirwufn9n9i\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 4000\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/07vdo951znh9zw45jf5xx6vvf.js\" \n",
       "        id=\"asciicast-07vdo951znh9zw45jf5xx6vvf\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 9000\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/b94b3x0dp1a7929yxgwxf994f.js\" \n",
       "        id=\"asciicast-b94b3x0dp1a7929yxgwxf994f\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "strm = ''\n",
    "for video_path, meta_path, url in videos:\n",
    "\n",
    "    with open(meta_path) as data_file:    \n",
    "        meta = json.load(data_file)\n",
    "    castid = url.split('/')[-1]\n",
    "    html_tag = \"\"\"\n",
    "    <h2>{0}\n",
    "    <script type=\"text/javascript\" \n",
    "        src=\"https://asciinema.org/a/{1}.js\" \n",
    "        id=\"asciicast-{1}\" \n",
    "        async data-autoplay=\"true\" data-size=\"big\">\n",
    "    </script>\n",
    "    \"\"\"\n",
    "    strm += html_tag.format('Episode ' + str(meta['episode_id']),\n",
    "                               castid)\n",
    "HTML(data=strm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Nice!!!\n",
    "\n",
    "You can see the progress of this agent. From total caos completely sinking into holes, to sliding into the goal fairly consistently.\n",
    "\n",
    "Let's inspect the Values and Policies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.03843929,  0.00907311,  0.05889145,  0.01216185,  0.07912733,\n",
       "        1.81991842,  0.01744396,  1.7286877 ,  0.09666762,  0.35824406,\n",
       "        0.66550044,  1.76576476,  1.89998862,  0.71040231,  0.97740312,\n",
       "        1.23591297])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "V = np.max(Q, axis=1)\n",
    "V"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 3, 3, 3, 0, 1, 2, 3, 3, 1, 0, 0, 3, 2, 1, 1])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pi = np.argmax(Q, axis=1)\n",
    "pi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Fair enough, let's close this environment and you will have a chance to submit to your OpenAI account. After that, you will have a chance to modify the `action_selection` to try something different."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "env.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2017-04-02 00:40:50,777] [FrozenLake-v0] Uploading 10000 episodes of training data\n",
      "[2017-04-02 00:40:52,390] [FrozenLake-v0] Uploading videos of 19 training episodes (1810 bytes)\n",
      "[2017-04-02 00:40:52,639] [FrozenLake-v0] Creating evaluation object from /tmp/tmpww4igizw with learning curve and training video\n",
      "[2017-04-02 00:40:52,874] \n",
      "****************************************************\n",
      "You successfully uploaded your evaluation on FrozenLake-v0 to\n",
      "OpenAI Gym! You can find it at:\n",
      "\n",
      "    https://gym.openai.com/evaluations/eval_hGg7u5NwS1a35elvMZC0Q\n",
      "\n",
      "****************************************************\n"
     ]
    }
   ],
   "source": [
    "gym.upload(mdir, api_key='<YOUR OPENAI API KEY>')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Your turn\n",
    "\n",
    "Maybe you want to try an exponential decay?? (http://www.miniwebtool.com/exponential-decay-calculator/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "P(t) = P0e-rt\n",
    "\n",
    "where: \n",
    "* P(t) = the amount of some quantity at time t \n",
    "* P0 = initial amount at time t = 0 \n",
    "* r = the decay rate \n",
    "* t = time (number of periods)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "def action_selection(state, Q, episode, n_episodes, decay=0.0006, initial=1.00):\n",
    "    epsilon = initial * math.exp(-decay*episode)\n",
    "    if np.random.random() < epsilon:\n",
    "        action = np.random.randint(len(Q[0]))\n",
    "    else:\n",
    "        action = np.argmax(Q[state])\n",
    "    return action, epsilon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Use the following code to test your new exploration strategy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.text.Text at 0x7fc798c3ab00>"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHGdJREFUeJzt3XucXGWd5/HPN93TwBAhXNoVCRCyxlkiugm0XBSZOEwk\nFwVHnSFBF51hzKLDKCM7+wqIFxjEeEEzLCyXBVZghOiIxoxGMyyCGVbIpDPJcEmMaUKAFmdoFFAQ\nNgR++0edNEX3qUt316lTVef7fr3q1VVPnTr1O31CfznPOed5FBGYmZkBTMq7ADMzax0OBTMzG+ZQ\nMDOzYQ4FMzMb5lAwM7NhDgUzMxvmUDAzs2EOBTMzG+ZQMDOzYd15FzBWBx54YEybNi3vMszM2sqG\nDRueiIjeWsu1XShMmzaN/v7+vMswM2srkh6uZzl3H5mZ2TCHgpmZDXMomJnZMIeCmZkNcyiYmdmw\nzEJB0vWSHpd0f4X3JekySQOS7pV0VFa1mJlZfbI8UvgaMK/K+/OBGcljCXBlhrWwbPUW5nzpDpat\n3pLl15iZtbXMQiEi1gK/qrLIqcCNUXIPMEXSQVnUsmz1Fq5au50dv/wtV63dztxL78zia8zM2l6e\n5xQOBh4tez2YtI0iaYmkfkn9Q0NDY/6iG+/e8YrX24ae5ZwVG8e8HjOzTpdnKCilLdIWjIhrIqIv\nIvp6e2vepT3KiylrXbnpsTGvx8ys0+UZCoPAIWWvpwKZ/KWef+RrslitmVnHyTMUVgFnJFchHQc8\nHRG/yOKLli+ancVqzcw6TpaXpN4C3A38nqRBSWdKOkvSWckiq4HtwADwv4CPZlVLJWdct67ZX2lm\n1tIyGyU1IhbXeD+Av8jq+0eaJHhpxLmFtdueaNbXm5m1hcLc0bzkbdPzLsHMrOUVJhSWLjgi7xLM\nzFpeYUKhEt/IZmb2skKFwh7dozd329CzOVRiZtaaChUKn3nXG/IuwcyspRUqFE4/9tDU9g0PP9nk\nSszMWlOhQqGS/3LtPXmXYGbWEgoXClOn7Dmq7bcvvJRDJWZmradwoXDX0pPyLsHMrGUVLhQq8VDa\nZmYOhWEeStvMrKChcOKMA/MuwcysJRUyFG4889i8SzAza0mFDIVKTlh2e94lmJnlqrChsHdP16i2\nwaeez6ESM7PWUdhQcBeSmdlohQ2Fow/bL7Xds7GZWZEVNhQq8WxsZlZkhQ4FX5pqZvZKhQ4Fn1cw\nM3ulQodCJbMuXJN3CWZmuSh8KPRO7hnV9tRzu3KoxMwsf4UPhfUXzM27BDOzllH4UKhk7qV35l2C\nmVnTORSASRrdtm3o2eYXYmaWM4cCsORt0/MuwcysJTgUgKULjkht98Q7ZlY0DoUqPPGOmRWNQyEx\na+q+eZdgZpY7h0Ji5dknpLZvePjJJldiZpafTENB0jxJWyUNSFqa8v6hku6QtFHSvZIWZFnPeCy6\n+id5l2Bm1jSZhYKkLuAKYD4wE1gsaeaIxS4AvhkRs4FFwP/Mqp56pN3d/MJLORRiZpaTLI8UjgEG\nImJ7ROwEVgCnjlgmgH2S5/sCuZ7Z9d3NZlZ0WYbCwcCjZa8Hk7ZynwU+IGkQWA38ZYb1jJsHyDOz\nosgyFFLuEyZGvF4MfC0ipgILgJskjapJ0hJJ/ZL6h4aGMij1ZXt2j/6VeIA8MyuKLENhEDik7PVU\nRncPnQl8EyAi7gb2BEbNfBMR10REX0T09fb2ZlRuydc/fFym6zcza2VZhsJ6YIakwyX1UDqRvGrE\nMo8AJwFIOoJSKGR7KFBDpbmb33zxbU2uxMys+TILhYjYBZwNrAG2ULrK6AFJF0k6JVnsXODDkv4V\nuAX4UESM7GJquu6UEfKGntmZQyVmZs3VneXKI2I1pRPI5W2fLnu+GXhrljWMx0WnHsn537kv7zLM\nzJrOdzSnOP3YQ1PbfRWSmXU6h0IFPV2ju5B8FZKZdTqHQgW3LDk+7xLMzJrOoVBBpauQ3IVkZp3M\noVCFu5DMrGgcClW4C8nMisahUEWlLqQjP/3DJldiZtYcDoUa0rqQntn5Yg6VmJllz6FQQ6UuJM/I\nZmadyKFQQ6UuJM/IZmadyKFQh8k9XaPaPCObmXUih0Id7r9oXmr7stVbmlyJmVm2HAoTcNXa7XmX\nYGbWUA6FOvVO7sm7BDOzzDkU6rT+grmp7e++/K4mV2Jmlh2HwgRtGnw67xLMzBrGoTAGs6bum3cJ\nZmaZciiMwcqzT0ht98ipZtYpHApjlDJ9s0dONbOO4VAYo4vf/cbU9pvXPdLkSszMGs+hMEaV5m8+\n/zv3NbkSM7PGcyiMw5S9uvMuwcwsEw6Fcdj0mZNT2+deemdzCzEzazCHQgNtG3o27xLMzCbEoTBO\nJ844MO8SzMwazqEwTjeeeWxq++s/ubrJlZiZNY5DYQLSpurc+WLkUImZWWM4FCbgZ59bkNruE85m\n1q4cChnwCWcza1cOhQma0bt3avuGh59sciVmZhNXVyhIulXSQkkOkRFuO3dOavsfX/mT5hZiZtYA\n9f6RvxI4HdgmaZmk/1TPhyTNk7RV0oCkpRWW+RNJmyU9IOnmOutpKWknnF/KoQ4zs4mqKxQi4v9E\nxPuBo4AdwG2SfiLpTyX9TtpnJHUBVwDzgZnAYkkzRywzAzgPeGtEvAE4Z9xbkqNKJ5zffPFtTa7E\nzGxi6u4OknQA8CHgz4GNwN9SColKf/mOAQYiYntE7ARWAKeOWObDwBUR8SRARDw+pupb3NAzO/Mu\nwcxsTOo9p/Bt4J+A3wXeFRGnRMQ3IuIvgckVPnYw8GjZ68GkrdzrgddL+r+S7pE0b2zlt45Kdzif\ncd26JldiZjZ+9R4pXBsRMyPi8xHxCwBJewBERF+Fz6RMR8PIO7u6gRnAHGAxcK2kKaNWJC2R1C+p\nf2hoqM6Sm6vSHc5rtz3R5ErMzMav3lC4OKXt7hqfGQQOKXs9FXgsZZnvRsQLEfEQsJVSSLxCRFwT\nEX0R0dfb21tnyc3XO7kn7xLMzCakaihIeo2ko4G9JM2WdFTymEOpK6ma9cAMSYdL6gEWAatGLLMS\neHvyXQdS6k7aPo7taAnrL5ib2v4fz/t+kysxMxufWrPFnEzp5PJU4Ctl7b8Bzq/2wYjYJelsYA3Q\nBVwfEQ9Iugjoj4hVyXvvkLQZeBH464j45bi2pEV0T4JdI65H9XBIZtYuFFH7L5ak90bErU2op6a+\nvr7o7+/Pu4yqpi0dfWQwuaeL+y9q2/PoZtbmJG2ocg54WNUjBUkfiIi/A6ZJ+sTI9yPiKykfsxTP\n7Hwx7xLMzGqqdaJ598A+k4FXpTwsxbtnvTa1/YRltze5EjOzsal6pBARVyc/L2xOOZ1h+aLZrNw0\n8kIrGHzq+RyqMTOrX63uo8uqvR8RH2tsOZ1jRu/eqUNon3Hduor3NJiZ5a3W1UcbmlJFB7rt3Dmp\nJ5x9M5uZtbJa3Uc3NKuQTjRlr26eem7XqPZlq7ewdMEROVRkZlZdrZvXlic//0HSqpGP5pTYvjZ9\n5uTU9qvWtu39eWbW4Wp1H92U/Pxy1oV0qr26J/HcyLvZgJvXPcLpxx6aQ0VmZpVVPVKIiA3Jzx9T\nGuvoSeBXwN1Jm9Ww5eL5qe3nf+e+JldiZlZbvUNnLwQeBC4DLgcGJKX/tbNRuiv8lm9e90hzCzEz\nq6HeUVIvBd4eEXMi4vcpDWL31ezK6iwDlyxMbffRgpm1mnpD4fGIGCh7vR3oqFnSspYyjTMAGx5+\nsrmFmJlVUevqo/dIeg/wgKTVkj4k6YPAP1AaGtvq9ODn048W3nvlT5pciZlZZbWOFN6VPPYE/h34\nfUqzpA0B+2VaWQeqdLTgcwtm1irqGjq7lbTD0NnVpN3lDLBjWfqRhJlZIzRk6Oyyle0JnAm8gdJR\nAwAR8WfjrrCgupQ+6Y7vcjazVlDvieabgNdQmontx5RmYvtNVkV1skrnFnyXs5m1gnpD4XUR8Sng\n2WQ8pIXAG7Mrq7NVum/hnBUbm1uImdkI9YbCC8nPpyQdCewLTMukogKodN9C2hwMZmbNVG8oXCNp\nP+BTwCpgM/CFzKoqgJ4KlyLNvfTO5hZiZlamrlCIiGsj4smI+HFETI+IV++elc3G52efW5DanjYx\nj5lZs9Q79tEBkv6HpH+RtEHSckkHZF1cp5vc05XaPuvCNU2uxMyspN7uoxWUhrV4L/A+4AngG1kV\nVRT3XzQvtT1tYh4zs2aoNxT2j4i/iYiHksfFwJQsCyuKqVP2TG2fXuEmNzOzLNUbCndIWiRpUvL4\nE8B/tRrgrqUnpba/hAfLM7PmqzUg3m8k/Rr4r8DNwM7ksQL4q+zLK4azTpye2u7B8sys2WrNvPaq\niNgn+TkpIrqTx6SI2KdZRXa6asNb+BJVM2umeruPkHSKpC8nj3dmWVQRVRoQz5eomlkz1XtJ6jLg\n45RuWtsMfDxpswaqdEPb6z+5usmVmFlR1XuksACYGxHXR8T1wLykzRqo0g1tO9OGVTUzy0Dd3Ue8\n8hLUfev5gKR5krZKGpC0tMpy75MUkmqO9d3pZvTundpeaR4GM7NGqjcUPg9slPQ1STcAG4BLqn1A\nUhdwBTAfmAksljQzZblXAR8D1o2l8E5127lzKr7nk85mlrWaoSBJwF3AccC3k8fxEbGixkePAQYi\nYntE7L6M9dSU5f4G+CLw/FgK72Q+6WxmeakZClGar3NlRPwiIlZFxHcj4t/qWPfBwKNlrweTtmGS\nZgOHRMT3xlJ0EVQ66exuJDPLUr3dR/dIevMY1532V234jKmkScBXgXNrrkhaIqlfUv/Q0NAYy2hP\nlU46A5xxnXvazCwb9YbC2ykFw4OS7pV0n6R7a3xmEDik7PVUoHwWmVcBRwJ3StpBqXtqVdrJ5oi4\nJiL6IqKvt7e3zpLb37tnvTa1fe22J5pciZkVRb2hMB+YDvwB8C7gncnPatYDMyQdLqkHWERpgh4A\nIuLpiDgwIqZFxDTgHuCUiOgf4zZ0rOWLZld8z91IZpaFWmMf7SnpHOCvKd2b8POIeHj3o9pnI2IX\ncDawBtgCfDMiHpB0kaRTGlR/x6t00hncjWRmjddd4/0bKM3P/E+8fGnpx+tdeUSsBlaPaPt0hWXn\n1Lveopk1dV82DT49qt3dSGbWaLW6j2ZGxAeSqTffB7ytCTXZCCvPPqHie+5GMrNGqhUKL+x+knQH\nWU6qdSP5pjYza5RaofCfJf06efwGeNPu58k8C9ZEs6amjy7im9rMrFFqzafQlcynsHtOhe6y555P\nocncjWRmWRvLgHjWAqp1Ix3uYDCzCXIotKFK03cGcM6Kjc0txsw6ikOhDS1dcATdFfbcyk2Ppb9h\nZlYHh0KbGrikcjeSzy+Y2Xg5FNpYtfMLDgYzGw+HQpurNGge+P4FMxs7h0KbW75odsWd6PsXzGys\nHAodYLu7kcysQRwKHcLnF8ysERwKHcTBYGYT5VDoMJXGRwIHg5nV5lDoMCvPPiF1cuzdZl24pmm1\nmFn7cSh0oIeqdCM99dwulq3e0sRqzKydOBQ6VLXzC1et3d7ESsysnTgUOphPPJvZWDkUOpyDwczG\nwqFQALd+5C0V33MwmFk5h0IBHH3YflXHSHIwmNluDoWCWL5oNr2Teyq+72AwM3AoFMr6C+bS01X5\nLgYHg5k5FArmZ59bQJVccDCYFZxDoYAe/PzCqjvewWBWXA6Fgtq+zMFgZqM5FAqsnmA4Z8XGptVj\nZvlzKBTc9mUL6a7yr2Dlpsc44oIfNK8gM8uVQ8EYuGQhU/bqrvj+c7tecneSWUE4FAyATZ85ueoN\nbuDzDGZFkGkoSJonaaukAUlLU97/hKTNku6VdLukw7Ksx6pbvmh21bGSwMFg1ukyCwVJXcAVwHxg\nJrBY0swRi20E+iLiTcC3gC9mVY/Vr55g8JwMZp0pyyOFY4CBiNgeETuBFcCp5QtExB0R8dvk5T3A\n1AzrsTGoFQxXrd3OdB81mHWcLEPhYODRsteDSVslZwK+zKWF1AqGl3B3klmnyTIU0gZTiNQFpQ8A\nfcCXKry/RFK/pP6hoaEGlmi11AoGKAXDzeseaUI1Zpa1LENhEDik7PVU4LGRC0n6Q+CTwCkR8f/S\nVhQR10REX0T09fb2ZlKsVbZj2UJOnHFg1WXO/8597k4y6wBZhsJ6YIakwyX1AIuAVeULSJoNXE0p\nEB7PsBaboBvPPNbdSWYFkFkoRMQu4GxgDbAF+GZEPCDpIkmnJIt9CZgM/L2kTZJWVVidtYh6u5N8\nF7RZe1JEajd/y+rr64v+/v68yyi8WReu4anndtVcrp4QMbPsSdoQEX01l3Mo2ETU013UPak0lIaZ\n5afeUPAwFzYh9RwJ7HrJVyiZtQuHgk3YjmULa46bBKUrlHwi2qy1ufvIGqreP/pdKs0AZ2bN4e4j\ny0W9Rw0vRilA5l56Z/ZFmVndHArWcPWMtrrbtqFnPcCeWQtx95FlbiznEXwJq1k23H1kLWPHsoX0\nTu6pa9lpS7/vk9FmOfKRgjXVWP/gn3XidJYuOCKjasyKwzevWUsbazj0Tu5h/QVzM6rGrPM5FKwt\njDUcBDzk8w5mY+ZQsLYynvMIPiltVj+HgrWl8YTDXt2T2HLx/AyqMescDgVra+O9AslHD2bpHArW\nESZyeaoDwuxlDgXrKNOXfp+XJvB5B4QVnUPBOtKy1Vu4au32Ca3DAWFF5FCwjteIO58v+aM3cvqx\nhzagGrPW5lCwwtjw8JO898qfNGRdPoqwTuVQsEI6YdntDD71fEPW5UtdrZM4FKzwjvz0D3lm54sN\nW58nBrJ25lAwK9PII4hy7m6yduFQMKsiy+G5HRTWihwKZnV63fnfZ9dEboKow6yp+7Ly7BOy/RKz\nKhwKZuPUzEl+ZvTuzW3nzmna91lxORTMGiSvmeA8wZA1kkPBLCPnrNjIyk2P5V2Gz13YmDgUzJoo\nq6ubJqqnS/zscwvyLsNagEPBrAXk1fU0Xj766FwOBbMW1m5hUQ8HSmtzKJi1qU4MjPGa3NPF/RfN\ny7uMjtASoSBpHvC3QBdwbUQsG/H+HsCNwNHAL4HTImJHtXU6FKzoHBr5u/Ujb+How/bLu4wxyT0U\nJHUBPwPmAoPAemBxRGwuW+ajwJsi4ixJi4A/iojTqq3XoWBWm4Ojc433Rsh6Q6F7XFXV5xhgICK2\nJwWtAE4FNpctcyrw2eT5t4DLJSnarU/LrMWMpX+/kUOPW/Y2DT7Nuy+/K7M75LMMhYOBR8teDwLH\nVlomInZJeho4AHiifCFJS4AlAIce6glRzBrp6MP2G/dJYgdKPu5/7NeZrTvLUFBK28gjgHqWISKu\nAa6BUvfRxEszs0aYSKBU4+6v6o587T6ZrTvLUBgEDil7PRUYeRvo7mUGJXUD+wK/yrAmM2sDrXp5\nayuEVdaDK2YZCuuBGZIOB34OLAJOH7HMKuCDwN3A+4Af+XyCmbWqVg2rRsosFJJzBGcDayhdknp9\nRDwg6SKgPyJWAdcBN0kaoHSEsCireszMrLYsjxSIiNXA6hFtny57/jzwx1nWYGZm9ZuUdwFmZtY6\nHApmZjbMoWBmZsMcCmZmNqztRkmVNAQ8PM6PH8iIu6ULwNtcDN7mYpjINh8WEb21Fmq7UJgISf31\nDAjVSbzNxeBtLoZmbLO7j8zMbJhDwczMhhUtFK7Ju4AceJuLwdtcDJlvc6HOKZiZWXVFO1IwM7Mq\nChMKkuZJ2ippQNLSvOsZL0mHSLpD0hZJD0j6eNK+v6TbJG1Lfu6XtEvSZcl23yvpqLJ1fTBZfpuk\nD+a1TfWS1CVpo6TvJa8Pl7Quqf8bknqS9j2S1wPJ+9PK1nFe0r5V0sn5bEl9JE2R9C1JP0329/Gd\nvp8l/VXy7/p+SbdI2rPT9rOk6yU9Lun+sraG7VdJR0u6L/nMZZLS5q2pLCI6/kFplNYHgelAD/Cv\nwMy86xrnthwEHJU8fxWlebBnAl8ElibtS4EvJM8XAD+gNKHRccC6pH1/YHvyc7/k+X55b1+Nbf8E\ncDPwveT1N4FFyfOrgI8kzz8KXJU8XwR8I3k+M9n3ewCHJ/8muvLerirbewPw58nzHmBKJ+9nSjMx\nPgTsVbZ/P9Rp+xk4ETgKuL+srWH7Ffhn4PjkMz8A5o+pvrx/QU3aCccDa8penwecl3ddDdq27wJz\nga3AQUnbQcDW5PnVwOKy5bcm7y8Gri5rf8VyrfagNEnT7cAfAN9L/sE/AXSP3MeUhms/PnnenSyn\nkfu9fLlWewD7JH8gNaK9Y/czL0/Pu3+y374HnNyJ+xmYNiIUGrJfk/d+Wtb+iuXqeRSl+yhtvuiD\nc6qlYZLD5dnAOuA/RMQvAJKfr04Wq7Tt7fY7WQ78d+Cl5PUBwFMRsSt5XV7/K+b+BnbP/d1O2zwd\nGAL+d9Jldq2kveng/RwRPwe+DDwC/ILSfttAZ+/n3Rq1Xw9Ono9sr1tRQqGuuaDbiaTJwK3AORFR\nbRbvStveNr8TSe8EHo+IDeXNKYtGjffaZpsp/Z/vUcCVETEbeJZSt0Ilbb/NST/6qZS6fF4L7A3M\nT1m0k/ZzLWPdxglve1FCoZ75otuGpN+hFAhfj4hvJ83/Lumg5P2DgMeT9krb3k6/k7cCp0jaAayg\n1IW0HJii0tze8Mr6h7dNr5z7u522eRAYjIh1yetvUQqJTt7Pfwg8FBFDEfEC8G3gLXT2ft6tUft1\nMHk+sr1uRQmF4fmikysXFlGaH7rtJFcSXAdsiYivlL21e75rkp/fLWs/I7mK4Tjg6eTwdA3wDkn7\nJf+H9o6kreVExHkRMTUiplHadz+KiPcDd1Ca2xtGb/Pu30X53N+rgEXJVSuHAzMonZRrORHxb8Cj\nkn4vaToJ2EwH72dK3UbHSfrd5N/57m3u2P1cpiH7NXnvN5KOS36HZ5Stqz55n3Bp4omdBZSu1HkQ\n+GTe9UxgO06gdDh4L7ApeSyg1Jd6O7At+bl/sryAK5Ltvg/oK1vXnwEDyeNP8962Ord/Di9ffTSd\n0n/sA8DfA3sk7XsmrweS96eXff6Tye9iK2O8KiOHbZ0F9Cf7eiWlq0w6ej8DFwI/Be4HbqJ0BVFH\n7WfgFkrnTF6g9H/2ZzZyvwJ9ye/vQeByRlysUOvhO5rNzGxYUbqPzMysDg4FMzMb5lAwM7NhDgUz\nMxvmUDAzs2EOBSs8SS9K2lT2qDqKrqSzJJ3RgO/dIenAia7HrJF8SaoVnqRnImJyDt+7g9J15080\n+7vNKvGRglkFyf/Jf0HSPyeP1yXtn5X035LnH5O0ORnrfkXStr+klUnbPZLelLQfIOkfkwHurqZs\nnBpJH0i+Y5OkqyV15bDJZg4FM2CvEd1Hp5W99+uIOIbSnaHLUz67FJgdEW8CzkraLgQ2Jm3nAzcm\n7Z8B7orSAHergEMBJB0BnAa8NSJmAS8C72/sJprVp7v2ImYd77nkj3GaW8p+fjXl/XuBr0taSWko\nCigNRfJegIj4UXKEsC+lyVXek7R/X9KTyfInAUcD65NJsvbi5QHRzJrKoWBWXVR4vttCSn/sTwE+\nJekNVB++OG0dAm6IiPMmUqhZI7j7yKy608p+3l3+hqRJwCERcQelCYCmAJOBtSTdP5LmAE9Eac6L\n8vb5lAa4g9IAaO+T9Orkvf0lHZbhNplV5CMFs+ScQtnrH0bE7stS95C0jtL/QC0e8bku4O+SriEB\nX42IpyR9ltKMafcCv+XlIZEvBG6R9C/AjykNFU1EbJZ0AfCPSdC8APwF8HCjN9SsFl+SalaBLxm1\nInL3kZmZDfORgpmZDfORgpmZDXMomJnZMIeCmZkNcyiYmdkwh4KZmQ1zKJiZ2bD/Dye4+nWKvMRf\nAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7fc798cc8898>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "Q = [[0]]\n",
    "n_episodes = 10000\n",
    "epsilons = []\n",
    "for episode in range(n_episodes):\n",
    "    _, epsilon = action_selection(0, Q, episode, n_episodes)\n",
    "    epsilons.append(epsilon)\n",
    "plt.plot(np.arange(len(epsilons)), epsilons, '.')\n",
    "plt.ylabel('Probability')\n",
    "plt.xlabel('Episode')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Let's redefine the `q_learning` function we had above and run it against the environment again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "def q_learning(env, alpha = 0.9, gamma = 0.9):\n",
    "    nS = env.env.observation_space.n\n",
    "    nA = env.env.action_space.n\n",
    "    \n",
    "    Q = np.random.random((nS, nA)) * 2.0\n",
    "    n_episodes = 10000\n",
    "    \n",
    "    for episode in range(n_episodes):\n",
    "        state = env.reset()\n",
    "        done = False\n",
    "        while not done:\n",
    "            action, _ = action_selection(state, Q, episode, n_episodes)\n",
    "            nstate, reward, done, info = env.step(action)\n",
    "            Q[state][action] += alpha * (reward + gamma * Q[nstate].max() * (not done) - Q[state][action])\n",
    "            state = nstate\n",
    "    return Q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2017-04-26 15:09:54,069] Making new env: FrozenLake-v0\n",
      "[2017-04-26 15:09:54,081] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000000.json\n",
      "[2017-04-26 15:09:54,086] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000001.json\n",
      "[2017-04-26 15:09:54,093] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000008.json\n",
      "[2017-04-26 15:09:54,104] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000027.json\n",
      "[2017-04-26 15:09:54,121] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000064.json\n",
      "[2017-04-26 15:09:54,135] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000125.json\n",
      "[2017-04-26 15:09:54,155] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000216.json\n",
      "[2017-04-26 15:09:54,190] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000343.json\n",
      "[2017-04-26 15:09:54,238] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000512.json\n",
      "[2017-04-26 15:09:54,294] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video000729.json\n",
      "[2017-04-26 15:09:54,367] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video001000.json\n",
      "[2017-04-26 15:09:54,679] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video002000.json\n",
      "[2017-04-26 15:09:55,004] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video003000.json\n",
      "[2017-04-26 15:09:55,437] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video004000.json\n",
      "[2017-04-26 15:09:55,972] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video005000.json\n",
      "[2017-04-26 15:09:56,626] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video006000.json\n",
      "[2017-04-26 15:09:57,344] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video007000.json\n",
      "[2017-04-26 15:09:58,132] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video008000.json\n",
      "[2017-04-26 15:09:58,978] Starting new video recorder writing to /tmp/tmpwmjvyk26/openaigym.video.2.93.video009000.json\n"
     ]
    }
   ],
   "source": [
    "mdir = tempfile.mkdtemp()\n",
    "env = gym.make('FrozenLake-v0')\n",
    "env = wrappers.Monitor(env, mdir, force=True)\n",
    "\n",
    "Q = q_learning(env)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Curious to see how the new agent did?? Let's check it out!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "videos = np.array(env.videos)\n",
    "n_videos = 5\n",
    "\n",
    "idxs = np.linspace(0, len(videos) - 1, n_videos).astype(int)\n",
    "videos = videos[idxs,:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "urls = []\n",
    "for i in range(n_videos):\n",
    "    out = check_output([\"asciinema\", \"upload\", videos[i][0]])\n",
    "    out = out.decode(\"utf-8\").replace('\\n', '').replace('\\r', '')\n",
    "    urls.append([out])\n",
    "videos = np.concatenate((videos, urls), axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <h2>Episode 0\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/f37ai65q2i1143m93y7td4rze.js\" \n",
       "        id=\"asciicast-f37ai65q2i1143m93y7td4rze\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 64\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/bdqpkz28pasx1r6b6dv3fppol.js\" \n",
       "        id=\"asciicast-bdqpkz28pasx1r6b6dv3fppol\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 729\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/2kllm3ospnf1gu3uuw75ung4z.js\" \n",
       "        id=\"asciicast-2kllm3ospnf1gu3uuw75ung4z\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 4000\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/8pqqbovk6ewlkmf6aq9r8jgoc.js\" \n",
       "        id=\"asciicast-8pqqbovk6ewlkmf6aq9r8jgoc\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    \n",
       "    <h2>Episode 9000\n",
       "    <script type=\"text/javascript\" \n",
       "        src=\"https://asciinema.org/a/e9r86ld4hcj35c2ir3twzho9o.js\" \n",
       "        id=\"asciicast-e9r86ld4hcj35c2ir3twzho9o\" \n",
       "        async data-autoplay=\"true\" data-size=\"big\">\n",
       "    </script>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "strm = ''\n",
    "for video_path, meta_path, url in videos:\n",
    "\n",
    "    with open(meta_path) as data_file:    \n",
    "        meta = json.load(data_file)\n",
    "    castid = url.split('/')[-1]\n",
    "    html_tag = \"\"\"\n",
    "    <h2>{0}\n",
    "    <script type=\"text/javascript\" \n",
    "        src=\"https://asciinema.org/a/{1}.js\" \n",
    "        id=\"asciicast-{1}\" \n",
    "        async data-autoplay=\"true\" data-size=\"big\">\n",
    "    </script>\n",
    "    \"\"\"\n",
    "    strm += html_tag.format('Episode ' + str(meta['episode_id']),\n",
    "                               castid)\n",
    "HTML(data=strm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Did it do good??? This isn't an easy thing, take your time. Be sure to look into the Notebook solution if you want an idea.\n",
    "\n",
    "For now, let's take a look at the value function and policy the agent came up with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0])"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "V = np.max(Q, axis=1)\n",
    "V"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0])"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pi = np.argmax(Q, axis=1)\n",
    "pi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Good??? Nice!\n",
    "\n",
    "Let's wrap-up!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2017-04-26 15:13:34,165] Finished writing results. You can upload them to the scoreboard via gym.upload('/tmp/tmpwmjvyk26')\n"
     ]
    }
   ],
   "source": [
    "env.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "gym.upload(mdir, api_key='<YOUR OPENAI API KEY>')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "So, this notebook shows you how agents do when they don't have a definition of the environment. They will be interacting with it, just like you and I would. \n",
    "\n",
    "Now, we are one step closer, but you probably are wondering, if this is 'model-free' reinforcement learning, is 'model-based' reinforcement learning the algorithms we learned before? Well, not really. Model-based reinforcement learning algorithms use of the experience, perhaps in addition to what model-free algorithms do, to come up models of the environment. This helps for many things, the one worth highlighting are, algorithms can require less computation, and more importantly less exploration. This is vital when experience is expensive to collect. Think a robot learning to walk. What's the price of a robot collapsing into the floor?\n",
    "\n",
    "Additionally, you should have a little thing bothering you. Isn't it disappointing to be dealing with discrete states and actions?? Who are we kidding? A robot doesn't know to go to state 2?!!? \n",
    "\n",
    "So, yeah, we have been working with discrete states and actions. That's just not the way the world works. Let's step it a bit up. In the following lessons we'll discuss what to do when states and later actions are continuous and perhaps too large to even store on a table the way Q does it in q-learning. You ready? Let's go."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
